Ocropodium: open source OCR for small-scale historical archives
Identifieur interne : 000305 ( Main/Exploration ); précédent : 000304; suivant : 000306Ocropodium: open source OCR for small-scale historical archives
Auteurs : Tobias Blanke [Royaume-Uni] ; Michael Bryant [Royaume-Uni] ; Mark Hedges [Royaume-Uni]Source :
- Journal of information science [ 0165-5515 ] ; 2012.
Abstract
Large-scale digitization projects dealing with text-based historical material face challenges that are not well catered for by commercial software. This article discusses the results of a project to build a scalable OCR workflow for historical collections based on open source tools that is particularly tailored towards use in small-scale historical archives. It argues that open source tools allow for better customization to match these requirements, particularly with regard to character model training and per-project language modelling. We offer insights into our accuracy evaluation results of various open source OCR tools, as well as a case study about the challenges and opportunities of open source OCR in historical archives.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000048
- to stream PascalFrancis, to step Corpus: 000073
- to stream PascalFrancis, to step Curation: 000720
- to stream PascalFrancis, to step Checkpoint: 000067
- to stream Main, to step Merge: 000308
- to stream Main, to step Curation: 000305
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Ocropodium: open source OCR for small-scale historical archives</title>
<author><name sortKey="Blanke, Tobias" sort="Blanke, Tobias" uniqKey="Blanke T" first="Tobias" last="Blanke">Tobias Blanke</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>King's College London</s1>
<s3>GBR</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Royaume-Uni</country>
<wicri:noRegion>King's College London</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Bryant, Michael" sort="Bryant, Michael" uniqKey="Bryant M" first="Michael" last="Bryant">Michael Bryant</name>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>King's College London</s1>
<s3>GBR</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Royaume-Uni</country>
<wicri:noRegion>King's College London</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Hedges, Mark" sort="Hedges, Mark" uniqKey="Hedges M" first="Mark" last="Hedges">Mark Hedges</name>
<affiliation wicri:level="1"><inist:fA14 i1="03"><s1>King's College London</s1>
<s3>GBR</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Royaume-Uni</country>
<wicri:noRegion>King's College London</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">13-0290838</idno>
<date when="2012">2012</date>
<idno type="stanalyst">PASCAL 13-0290838 INIST</idno>
<idno type="RBID">Pascal:13-0290838</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000048</idno>
<idno type="stanalyst">FRANCIS 13-0290838 INIST</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000073</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000720</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000067</idno>
<idno type="wicri:doubleKey">0165-5515:2012:Blanke T:ocropodium:open:source</idno>
<idno type="wicri:Area/Main/Merge">000308</idno>
<idno type="wicri:Area/Main/Curation">000305</idno>
<idno type="wicri:Area/Main/Exploration">000305</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Ocropodium: open source OCR for small-scale historical archives</title>
<author><name sortKey="Blanke, Tobias" sort="Blanke, Tobias" uniqKey="Blanke T" first="Tobias" last="Blanke">Tobias Blanke</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>King's College London</s1>
<s3>GBR</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Royaume-Uni</country>
<wicri:noRegion>King's College London</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Bryant, Michael" sort="Bryant, Michael" uniqKey="Bryant M" first="Michael" last="Bryant">Michael Bryant</name>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>King's College London</s1>
<s3>GBR</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Royaume-Uni</country>
<wicri:noRegion>King's College London</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Hedges, Mark" sort="Hedges, Mark" uniqKey="Hedges M" first="Mark" last="Hedges">Mark Hedges</name>
<affiliation wicri:level="1"><inist:fA14 i1="03"><s1>King's College London</s1>
<s3>GBR</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Royaume-Uni</country>
<wicri:noRegion>King's College London</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Journal of information science</title>
<title level="j" type="abbreviated">J. inf. sci.</title>
<idno type="ISSN">0165-5515</idno>
<imprint><date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Journal of information science</title>
<title level="j" type="abbreviated">J. inf. sci.</title>
<idno type="ISSN">0165-5515</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Large-scale digitization projects dealing with text-based historical material face challenges that are not well catered for by commercial software. This article discusses the results of a project to build a scalable OCR workflow for historical collections based on open source tools that is particularly tailored towards use in small-scale historical archives. It argues that open source tools allow for better customization to match these requirements, particularly with regard to character model training and per-project language modelling. We offer insights into our accuracy evaluation results of various open source OCR tools, as well as a case study about the challenges and opportunities of open source OCR in historical archives.</div>
</front>
</TEI>
<affiliations><list><country><li>Royaume-Uni</li>
</country>
</list>
<tree><country name="Royaume-Uni"><noRegion><name sortKey="Blanke, Tobias" sort="Blanke, Tobias" uniqKey="Blanke T" first="Tobias" last="Blanke">Tobias Blanke</name>
</noRegion>
<name sortKey="Bryant, Michael" sort="Bryant, Michael" uniqKey="Bryant M" first="Michael" last="Bryant">Michael Bryant</name>
<name sortKey="Hedges, Mark" sort="Hedges, Mark" uniqKey="Hedges M" first="Mark" last="Hedges">Mark Hedges</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000305 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000305 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:13-0290838 |texte= Ocropodium: open source OCR for small-scale historical archives }}
This area was generated with Dilib version V0.6.32. |